

We claim: 



1. 



A method of estimating the results of a database query, the method 



compnsmg: 



5 



collecting workload information related to the database; 



\ tracing query patterns of queries in the workload to identify the usage of 
tuples in the database during execution of the queries; 

\ determining sample weights based on tuple usage; and 
performing a weighted sampling of the database based upon the sample 



2. The method of claim 1 wherein the weighting sampling is based on a 
probability of usagis of tuples required in executing the workload. 

15 3. The method of claim 2 and further comprising computing an aggregate 
over values in each sampled tuple. 

4. The method of claim 3 wherein the aggregate is computed by multiplying 
each value by the inverse of the probability with which corresponding tuples 

20 were sampled. \ 

5. The method of claim 1 wherein ffie weights are a function of the 
frequency of access of a tuple and the number of queries in the workload that 
access the tuple. \ 

25 \ 

6. The method of claim 1 wherein the tuple usage is stored on a page level. 

7. A machine readable medium having instructions for causing a machine to 
perform a method of estimating the results of a database query , the method 

30 comprising: \ 

collecting workload information related to the database; \ 
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wemhts. 
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tracing query patterns of queries in the workload to identify the usage of 
iples in the database during execution of the queries; 

determining sample weights based on tuple usage; and 
performing a weighted sampling based upon tuple usage. 

8. The machine readable medium of claim 7 wherein the weights aire a 
function ^ the frequency of access of a tuple and the number of queries in the 
workload tiilat access the tuple. 

9. The metnpd of claim 7 wherein the tuple usage is stored on a page level. 




10. A system that^estimates the results of a database query , the method 
comprising: 

a module that collects workload information related to the database; 

a module that traces^uery patterns of queries in the workload to identify 
the usage of tuples in the database during execution of the queries; 

a module that determines\sample weights based on tuple usage; and 

a module that performs a weighted sampling of the database based upon 
the sample weights. 

11. A method of estimating the result^of a database query, the method 
comprising: 

performing a weighted sampling of tu^Jes in the database based on a 
probability of usage of tuples required in execunng a given workload; 
storing the probability for each tuple sampV , 
computing an aggregate over values in each\ampled tuple while 
multiplying by the inverse of the probability with which each tuple was sampled. 



12. The method of claim 1 1 wherein the weights are a tunction of the 

\ 

30 frequency of access of a tuple and the number of queries in tire workload that 
access the tuple. 



13. 



The method of claim 1 1 wherein the tuple usage is stored on\a page level. 
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14. A system that estimates the results of a database query, the system 
comprising: 

means for performing a weighted sampling of tuples in the database 
\hscd on a probability of usage of tuples required in executing a given workload; 
means for storing the probability for each tuple sampled; 
means for computing an aggregate over values in each sampled tuple 
while nik^ltiplying by the inverse of the probability with which each tuple was 
sampled. 

15. A macftme readable medium having instructions for causing a machine to 
perform a methoo^of eistimating the results of a database query, the method 
comprising: 

performing a v^eighted sampling of tuples in the database based on a 
probability of usage of tuples required in executing a given workload; 

storing the probability for each tuple sampled; 

computing an aggregke over values in each sampled tuple while 
multiplying by the inverse of tWe probability with which each tuple was sampled. 



20 16. A method of sampling tuple\ in a database to estimate the answer of an 
aggregation query, the method compriWg: 

determining which tuples are accessed more often during execution of a 
workload; 

sampling the tuples; and 
25 calculating an aggregate of values in the ^mpled tuples as a function of 

which tuples are accessed more often. 



1 7. A method of generating an outlier index for a database and a given 

workload wherein the queries in the workload may have selection or group by 

30 conditions, the method comprising: 

identifying sub-relations of tuples in the database induce^by selection 

and group by conditions in queries in the workload; 

generating a variance for values in each sub-relation; 
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selecting sub-relations having higher variances; and 
generating outiiers from such sub-relations having higher variances. 

1 ^ The method of claim 1 7 and further comprising taking a union of outiiers 
5 genWted from such sub-relations. 

19, The method of claim 17 wherein sub-relations are selected having a 
variance higher than a desired threshold. 

10 20. A machine readable medium having instructions for causing a machine to 
perform a method of generating an outlier index for a database and a given 
workload whereinsthe queries in the workload may have selection or group by 
conditions, the metnpd comprising; 

identifying su'^relations of tuples in the database induced by selection 
15 and group by conditions>in queries in tiie workload; 

generating a variamje for values in each sub-relation; 

selecting sub-relatio^ having higher variances; and 

generating outiiers froni such sub-relations having higher variances. 

20 21. A system that generating amouflier index for a database and a given 
workload wherein tiie queries in the workload may have selection or group by 
conditions, the method comprising: \ 

a module that identifies sub-relations of tuples in the database induced by 
selection and group by conditions in queries m the workload; 
25 a module that generates a variance for values in each sub-relation; 

a module that selects sub-relations having mgher variances; and 
a module that generates outiiers from such sub-relations having higher 
variances. \ 

30 22. A method of generating an outUer index for a database and a given 

workload wherein the queries in the workload may have aggrWation and 

selection or group by conditions, the system comprising: \ 

identifying sub-relations of tuples; \ 
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generating weights for each sub-relation based on workload information; 
generating a weighted variance for values in an aggregation column in 
each si^b-relation; 

^locating memory to sub-relations in proportion to their weighted 
5 vanances;\find 

gene^ting outhers from such sub-relations based on allocated memory. 

23. The method of claim 22 and further comprising a module that takes a 
union of outliers gmierated from such sub-relations. 

10 \ 

24. The system of Maim 22 wherein sub-relations are selected having a 
weighted variance hig;heivthan a desired threshold. 

25. The method of claim x2 wherein the weights are a fiinction of a number 
15 of queries in the workload thatSreference the sub-relation. 

26. A method of generating an ^tlier index for a database and a given 
workload wherein the queries in the workload may have selection or group by 
conditions, the method comprising: 

20 identifying sub-relations of tuples paving values to be aggregated; 

generating weights for sub-relations>based on workload information; 

generating a weighted variance for values in sub-relations; 

selecting sub-relations having higher wekhted variances; and 

generating outliers from such sub-relatioi\having higher weighted^ 
25 variances. 

27. The method of claim 26 and further comprising ticking a union of outliers 
generated from such sub-relations. 



30 28. The method of claim 26 wherein sub-relations are sele6jed having a 
weighted variance higher than a desired threshold. 
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\ 29. The method of claim 26 wherein the weights are a function of a number 
\ of queries in the workload that reference the sub-relation. 

30y. A machine readable medium having instructions for causing a machine to 
5 perrbrm a method of generating an outlier index for a database and a given 
workload wherein the queries in the workload may have selection or group by 
conditions, the method comprising: 

identifying sub-relations of tuples having values to be aggregated; 

generating weights for sub-relations based on workload information; 
10 generating a weighted variance for values in the sub-relations; 

selecting sub-relations having higher weighted variances; and 

generating outUers from such sub-relations having higher weighted 
variances. 

15 31. A system that generates an outlier index for a database and a given 

workload wherein the queries in the workload may have selection or group by 

conditions, the system conH>rising: 

means for identifying sub-relations of tuples having values to be 

aggregated; \ 
20 means for generating weights for each sub-relation based on workload 

information; 

means for generating a weigWd variance for values in each sub-relation; 
means for selecting sub-relations having higher weighted variances; and 
means for generating outliers frOpi such sub-relations having higher 
25 weighted variances. 

32. A method of estimating the results of\ database query over a relation, 
the method comprising: 

defining weights of sub-relations using wWkload information; 
30 calculating a weighted variance for each suWelation; 

allocating memory to sub-relations in proportion to respective weighted 
variances; 
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building an outlier index for each sub-relation in accordance with 
\allocated memory; and 

taking the union of outiier indexes. 

33. \ The method of claim 32 and further comprising retaining sub-relations 
havikg a weighted variance over a desired threshold prior to allocating memory 
to subVelations. 

34. 1% method of claim 32 wherein each weighted variance is a function of 
10 the weight of a sub-relation and the variance of the sub-relation. 

35. A methW of estimating the results of a database and a given workload 
wherein the queries in the workload may have selection conditions, the method 
comprising: 

15 collecting w^kload information related to the database; 

tracing query piattems of queries in the workload to identify the usage of 
tuples in the database duMng execution of tiie queries; 

determining samplk weights based on tuple usage; 

performing a weighted sampling of the database based upon the sample 
20 weights; and 

generating a weighted outlier index. 
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36. The method of claim 35 and farther comprising calculating an aggregate 
based on the samples and the index. 

37. A method of estimating an aggregat'b result of a database and a given 
workload wherein the queries in the workloadNmay have selection or group by 
conditions, the method comprising: \ 

collecting workload information related to me database; 
tracing query patterns of queries in the workload to identify the usage of 
tuples in the database during execution of the queries; 

generating sample weights based on tuple usage; 



performing a weighted sampling based upon tuple 
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generating an aggregate based on the weighted sampling; 
identifying sub-relations of tuples in the database induced by selection 
an^ group by conditions in queries in the workload; 

generating weights for each sub-relation based on workload information; 
generating a weighted variance for values in an aggregation column in 
each sub-relation; 

allocating memory to sub-relations in proportion to their weighted 
variances; 

genei^ting outlier indexes from such sub-relations based on allocated 
10 memory; 

performing a union on the outlier indexes to form an outlier index for the 
relation; 

computing aggregate on the outlier index for the relation; and 
combining tiie aggregate based on the weighted sampling with the aggregate on 
15 the outlier index. 

38. A method of estimating the results of a database and a given workload 
wherein the queries in the wofkload may have selection conditions, the method 
comprising: 

20 building an outlier indexVn outher values; 

building a sample of non-outher values; 

aggregating the outlier value\and non-outlier values and scaling values 
as required. 

25 39. The method of claim 38 wherein nje outlier index is built using 
workload information. 
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40. The method of claim 38 wherein the non^outlier sample is based on 
uniform sampling. 

41 . The method of claim 38 wherein the non-outli^r sample is based on 
weighted workload information. 
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